Speech Database Speech Analysis Training of MSD - HSMM Excitation parameters Spectral parameters Speech signal Context - dependent MSD - HSMMs and duration models Speech Parameter Generation
نویسندگان
چکیده
This paper describes the text-to-speech synthesis system developed for the Blizzard Challenge 2016 by members of the ADAPT centre and colleagues from associated projects. The task was to build a synthetic voice for reading audiobooks to children, from a speech database of audiobooks around 5 hours long. Our entry system is an HMM-based parametric speech synthesizer which was built using a subset of the database (half the total number of the audiobooks of the full dataset). We only used this subset because it was the best quality data we could obtain under the time constraints posed by the Challenges’ deadlines. The main parts of the work undertaken on the development of the system for this challenge were on text chunking, including splitting of sentences and segments of text in quotes, and automatic alignment of speech and text data. We also aimed to synthesize speech with emotions to improve the expressiveness of the synthetic speech. Although we could not concretize this task on time for the submission, we plan to carry on this work and possibly use it in a future entry of our system to the Blizzard Challenge.
منابع مشابه
Improved Bayesian Training for Context-Dependent Modeling in Continuous Persian Speech Recognition
Context-dependent modeling is a widely used technique for better phone modeling in continuous speech recognition. While different types of context-dependent models have been used, triphones have been known as the most effective ones. In this paper, a Maximum a Posteriori (MAP) estimation approach has been used to estimate the parameters of the untied triphone model set used in data-driven clust...
متن کاملA Hidden Semi-Markov Model-Based Speech Synthesis System
Recently, a statistical speech synthesis system based on the hidden Markov model (HMM) has been proposed. In this system, spectrum, excitation, and duration of human speech are modeled simultaneously by context-dependent HMMs and speech parameter vector sequences are generated from the HMMs themselves. This system defines a speech synthesis problem in a generative model framework and solves it ...
متن کاملUsing prosody to improve Mandarin automatic speech recognition
In this paper, these problems of how to model and train Mandarin prosody dependent acoustic model and how to decode input speech based on prosody dependent speech recognition system will be discussed. We use automatic prosody labeling methods to annotate syllable prosodic break type and stress type on continuous speech corpus, and utilize our proposed methods to train prosody dependent tonal sy...
متن کاملSpeaker-Dependent Model Interpolation for Statistical Emotional Speech Synthesis
In this article, we propose a speaker-dependent model interpolation method for statistical emotional speech synthesis. The basic idea is to combine the neutral model set of the target speaker and an emotional model set selected from a pool of speakers. For model selection and interpolation weight determination, we propose to use a novel monophone-based Mahalanobis distance, which is a proper di...
متن کاملSynthesis of fast speech with interpolation of adapted HSMMs and its evaluation by blind and sighted listeners
In this paper we evaluate a method for generating synthetic speech at high speaking rates based on the interpolation of hidden semi-Markov models (HSMMs) trained on speech data recorded at normal and fast speaking rates. The subjective evaluation was carried out with both blind listeners, who are used to very fast speaking rates, and sighted listeners. We show that we can achieve a better intel...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016